Skip to content

Back read-only MAP_SHARED file mappings with MAP_PRIVATE#84

Merged
jserv merged 1 commit into
sysprog21:mainfrom
Max042004:fix-mmap-shared-ro
Jul 2, 2026
Merged

Back read-only MAP_SHARED file mappings with MAP_PRIVATE#84
jserv merged 1 commit into
sysprog21:mainfrom
Max042004:fix-mmap-shared-ro

Conversation

@Max042004

@Max042004 Max042004 commented Jun 6, 2026

Copy link
Copy Markdown
Collaborator

A MAP_SHARED, PROT_READ mapping of a file opened O_RDONLY could never be
installed. hvf_apply_file_overlay_quiesced() always mmap'd the host page
PROT_READ|PROT_WRITE and mapped the HVF segment RWX. On a read-only fd the
host mmap fails with EACCES (writable mapping of an O_RDONLY fd); forcing
PROT_READ then trips hv_vm_map(), because a MAP_SHARED mapping of an
O_RDONLY fd has macOS max_protection=READ and HVF cannot grant stage-2
rights (RWX) beyond the host region's max_protection (HV_ERROR).

This blocked every workload that maps a read-only file MAP_SHARED -- most
visibly the JVM, which maps its ~135 MiB lib/modules image exactly this
way and crashed on startup.

Choose the host backing from what the fd and the guest actually need:

  • guest wants PROT_WRITE: MAP_SHARED PROT_READ|PROT_WRITE (writes reach
    the file; an O_RDONLY fd still yields EACCES, matching Linux).
  • guest read-only on a writable fd: MAP_SHARED PROT_READ (max_protection
    is RWX, so the segment maps and cross-mapping coherence is preserved).
  • guest read-only on an O_RDONLY fd: MAP_PRIVATE PROT_READ. Its
    max_protection is RWX so the segment maps; the pages still show file
    content, and the guest's stage-1 tables keep the region read-only so
    the private copy is never dirtied -- no observable MAP_SHARED
    divergence for a read-only mapping.

The guest-requested prot is threaded through hvf_apply_file_overlay(),
hvf_apply_file_overlay_quiesced(), and restore_file_overlay_range() so
every overlay install/restore site picks the correct backing.

Add test-mmap-shared-ro covering the O_RDONLY read path, a second
concurrent read-only mapping, EACCES on a writable request, and the
read-only-mapping-on-O_RDWR-fd branch.

(cherry picked from commit 337d39a4313109884112a86a0c4147bddfe18fa1)


Summary by cubic

Fixes read-only MAP_SHARED mappings of O_RDONLY files by using MAP_PRIVATE when needed, returns EACCES for MAP_SHARED|PROT_WRITE on read-only fds, and blocks mprotect(PROT_WRITE) upgrades to preserve Linux max_prot semantics. This unblocks JVM lib/modules and matches Linux behavior.

  • Bug Fixes

    • Select backing by requested prot and fd mode:
      • Read-only on O_RDONLY: MAP_PRIVATE | PROT_READ.
      • Read-only on writable fd: MAP_SHARED | PROT_READ.
      • Writable: MAP_SHARED | PROT_READ|PROT_WRITE (EACCES on O_RDONLY).
    • In sys_mmap (non-fixed), reject MAP_SHARED | PROT_WRITE on read-only fds early with EACCES (no silent snapshot).
    • Track read-only shared backing with backing_ro and reject mprotect(PROT_WRITE) with EACCES; keep this across snapshots and all sys_mremap paths; prevent region merges when backing_ro differs.
  • Tests

    • Add test-mmap-shared-ro covering: read-only MAP_SHARED on O_RDONLY, concurrent read-only mappings, EACCES on writable request, read-only mapping on an O_RDWR fd, and EACCES on mprotect(PROT_WRITE).

Written for commit bdde276. Summary will update on new commits.

Review in cubic

cubic-dev-ai[bot]

This comment was marked as resolved.

Comment thread src/syscall/mem.c Outdated
Comment thread src/syscall/mem.c Outdated
Comment thread tests/test-mmap-shared-ro.c Outdated
jserv

This comment was marked as duplicate.

@jserv jserv left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebase onto the latest main branch, resolve any merge conflicts, and refine the changes based on the review feedback.

@Max042004 Max042004 force-pushed the fix-mmap-shared-ro branch 2 times, most recently from 8dcfec9 to c2aa4fa Compare July 2, 2026 10:35
A MAP_SHARED, PROT_READ mapping of a file opened O_RDONLY is common --
the JVM maps its ~135 MiB lib/modules image exactly this way. That case
is already fixed on main by 520568c ("Harden runtime around foot"),
which routes non-writable-fd MAP_SHARED requests through the
pread-snapshot fallback in sys_mmap's non-fixed path instead of
installing a live overlay.

That fallback fires for any MAP_SHARED request the overlay's
overlay_fd_writable() gate would reject, without checking the guest's
requested prot. Two Linux-visible corners were left open as a result:

  - mmap(MAP_SHARED, PROT_WRITE) of an O_RDONLY fd silently succeeded
    via the fallback instead of failing EACCES. Fixed by checking
    overlay_fd_writable() before falling through to pread, rolling back
    the allocation and returning EACCES when the guest asked for
    PROT_WRITE against a non-writable backing fd.

  - Once a read-only MAP_SHARED mapping succeeded, nothing stopped a
    follow-up mprotect(PROT_READ | PROT_WRITE) from upgrading it. Linux
    tracks max_prot per VMA from the fd's open mode and rejects that
    upgrade with EACCES; sys_mprotect only consulted prot_to_perms() and
    happily granted it, so a subsequent guest write landed in
    guest-local memory with no error ever surfaced to the caller. Fixed
    by adding guest_region_t.backing_ro, set on a MAP_SHARED region
    whenever its backing_fd lacks write access (the same
    overlay_fd_writable() check), threaded through regions_mergeable
    (so two regions with different backing_ro never silently coalesce),
    region_snapshot_t capture/restore, and all three sys_mremap
    region-recreation sites. sys_mprotect now rejects a PROT_WRITE
    request over any MAP_SHARED region with backing_ro set, before
    doing any PTE work.

test-mmap-shared-ro covers the O_RDONLY read path (already fixed by
520568c), a second concurrent read-only mapping, the O_RDONLY
mmap(PROT_WRITE) rejection, the mprotect(PROT_WRITE) upgrade rejection,
and the read-only-mapping-on-O_RDWR-fd branch. NPAGES is bumped from 64
(256 KiB, fits in one 2 MiB HVF segment) to 768 (3 MiB, crosses a
segment boundary) so the cases actually exercise
hvf_segment_split's multi-block path.
@Max042004 Max042004 force-pushed the fix-mmap-shared-ro branch from c2aa4fa to bdde276 Compare July 2, 2026 10:44
@jserv jserv merged commit bbc18f7 into sysprog21:main Jul 2, 2026
4 checks passed
@jserv

jserv commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Thank @Max042004 for contributing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants